HP-DAEMON: High Performance Distributed Adaptive Energy-efficient Matrix-multiplicatiON
نویسندگان
چکیده
The demands of improving energy efficiency for high performance scientific applications arise crucially nowadays. Software-controlled hardware solutions directed by Dynamic Voltage and Frequency Scaling (DVFS) have shown their effectiveness extensively. Although DVFS is beneficial to green computing, introducing DVFS itself can incur non-negligible overhead, if there exist a large number of frequency switches issued by DVFS. In this paper, we propose a strategy to achieve the optimal energy savings for distributed matrix multiplication via algorithmically trading more computation and communication at a time adaptively with user-specified memory costs for less DVFS switches, which saves 7.5% more energy on average than a classic strategy. Moreover, we leverage a high performance communication scheme for fully exploiting network bandwidth via pipeline broadcast. Overall, the integrated approach achieves substantial energy savings (up to 51.4%) and performance gain (28.6% on average) compared to ScaLAPACK pdgemm() on a cluster with an Ethernet switch, and outperforms ScaLAPACK and DPLASMA pdgemm() respectively by 33.3% and 32.7% on average on a cluster with an Infiniband switch.
منابع مشابه
Mathematical Analysis of Optimal Tracking Interval Management for Power Efficient Target Tracking Wireless Sensor Networks
In this paper, we study the problem of power efficient tracking interval management for distributed target tracking wireless sensor networks (WSNs). We first analyze the performance of a distributed target tracking network with one moving object, using a quantitative mathematical analysis. We show that previously proposed algorithms are efficient only for constant average velocity objects howev...
متن کاملAdaptive Algorithm Selection Using an Integrated Hybrid Performance Modeling Approach
Recent advances in parallel and distributed computing have made it very challenging for programmers to reach the performance potential of current systems. In addition, recent advances in numerical algorithms and software optimizations have tremendously increased the number of alternatives for solving a problem, which further complicates the software tuning process. Indeed, no single algorithm c...
متن کاملEnergy-Efficient Design of Kernel Applications for FPGAs Through Domain-Specific Modeling
Because of their high performance and flexibility, FPGAs are an attractive option for use in embedded systems, where both high performance and low energy consumption are important. Therefore, it is important to create FPGA designs that are not only high performance but also low energy. The flexibility of FPGAs facilitates their high performance, but also makes it difficult to design for them. T...
متن کاملGeneralizing of a High Performance Parallel Strassen Implementation on Distributed Memory MIMD Architectures
Strassen’s algorithm to multiply two n×n matrices reduces the asymptotic operation count from O(n) of the traditional algorithm to O(n), thus designing efficient parallelizing for this algorithm becomes essential. In this paper, we present our generalizing of a parallel Strassen implementation which obtained a very nice performance on an Intel Paragon: faster 20% for n ≈ 1000 and more than 100%...
متن کاملEffects of Dietary Protein and Energy Levels on Productive and Reproductive Performance of Lactating Buffaloes
Twenty eight lactating buffaloes were used in a completely randomized design with 2×2 factorial arrangement of four experimental diets including low protein–low energy (LP-LE), low protein–high energy (LP-HE), high protein–low energy (HP-LE) and high protein–high energy (HP-HE). Results showed that the HP-HE diet recorded the highest digestibility coefficients of CP, EE, NFE, nutritive values, ...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2014